首页> 外文OA文献 >Snorkel: Rapid Training Data Creation with Weak Supervision
【2h】

Snorkel: Rapid Training Data Creation with Weak Supervision

机译:浮潜:弱监督下的快速培训数据创建

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Labeling training data is increasingly the largest bottleneck in deployingmachine learning systems. We present Snorkel, a first-of-its-kind system thatenables users to train state-of-the-art models without hand labeling anytraining data. Instead, users write labeling functions that express arbitraryheuristics, which can have unknown accuracies and correlations. Snorkeldenoises their outputs without access to ground truth by incorporating thefirst end-to-end implementation of our recently proposed machine learningparadigm, data programming. We present a flexible interface layer for writinglabeling functions based on our experience over the past year collaboratingwith companies, agencies, and research labs. In a user study, subject matterexperts build models 2.8x faster and increase predictive performance an average45.5% versus seven hours of hand labeling. We study the modeling tradeoffs inthis new setting and propose an optimizer for automating tradeoff decisionsthat gives up to 1.8x speedup per pipeline execution. In two collaborations,with the U.S. Department of Veterans Affairs and the U.S. Food and DrugAdministration, and on four open-source text and image data sets representativeof other deployments, Snorkel provides 132% average improvements to predictiveperformance over prior heuristic approaches and comes within an average 3.60%of the predictive performance of large hand-curated training sets.
机译:标注训练数据越来越成为部署机器学习系统的最大瓶颈。我们展示了Snorkel,这是首创​​的系统,它使用户能够训练最先进的模型而无需人工标记任何训练数据。取而代之的是,用户编写了表示任意启发式的标签函数,这些函数可能具有未知的准确度和相关性。通过合并我们最近提出的机器学习范例(数据编程)的第一个端到端实现,Snorkeldenoise可以在不获取基本事实的情况下对它们的输出进行量化。我们根据过去一年与公司,代理商和研究实验室的合作经验,提供了一个灵活的接口层,用于编写标签功能。在用户研究中,主题专家建立模型的速度提高了2.8倍,预测性能平均提高了45.5%,而人工标记的时间为7小时。我们在此新设置中研究了建模权衡,并提出了用于自动权衡决策的优化器,该优化器可使每次管道执行的速度提高1.8倍。在与美国退伍军人事务部和美国食品与药物管理局的两次合作中,在代表其他部署的四个开源文本和图像数据集上,Snorkel的预测性能比以前的启发式方法平均提高了132%,并且在平均水平之内大型手策训练集的预测性能的3.60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号